Parallel buses had three major timing issues that prevented them from going much faster:
| Problem | Why It Limits Speed | PCIe Solution |
|---|---|---|
| Flight Time (signal propagation delay) | Data must arrive within one clock period. At high clock speeds, there’s not enough margin. | PCIe sends the clock inside the data stream (embedded clock). The receiver recovers the clock from data, so flight time no longer matters. |
| Clock Skew (different clock arrival times at transmitter & receiver) | Reduces timing budget and risks sampling errors. | Eliminated — the recovered clock is aligned with data. |
| Signal Skew (different data lines arrive at slightly different times) | Must wait for slowest signal before latching, limiting clock speed. | Gone — PCIe sends one bit per lane, so no intra-lane skew. (If multiple lanes are used, receiver performs lane deskew automatically.) |
| This is why PCIe can scale to 2.5 GT/s, 5.0 GT/s, 8 GT/s, and beyond — something impractical for PCI/PCI-X. |
Bandwidth Math
PCIe combines:
- High frequency signaling (2.5 GT/s → 8 GT/s)
- Multiple lanes per Link (x1, x2, x4, x8, x16, x32)
- Full-duplex communication (send + receive simultaneously)
So effective bandwidth = bit rate × (payload efficiency) × lane count × 2 (duplex)
Example for Gen1 (x1 link):
- Bit rate: 2.5 GT/s
- Encoding: 8b/10b (8-bit input/10-bit output)→ 80% efficiency (lose 20% to encoding overhead)
- Effective data rate: 2.5 GT/s × 0.8 = 2.0 Gbit/s
- Throughput: 2.0 Gbit/s ÷ 8 = 0.25 GB/s per direction
- Full-duplex aggregate: 0.5 GB/s total.
Gen2 doubles this (5.0 GT/s → 0.5 GB/s per direction per lane).
Gen3 uses 128b/130b encoding (~98.5% efficiency) + 8 GT/s to nearly double bandwidth again.